A Unified Approach to Algorithms with a Suboptimality Test in Discounted Semi-markov Decision Processes

نویسنده

  • Katsuhisa Ohno
چکیده

This paper deals with computational algorithms for obtaining the optimal stationary policy and the minimum cost of a discounted semi-Markov decision process. Van Nunen [23) has proposed a modified policy iteration algorithm with a suboptimality test of MacQueen type, where the modified policy iteration algorithm is policy iteration method with the policy evaluation routine by a finite number of iterations of successive approximations and includes the method of successive approximations and policy iteration method as special cases. This paper devises a modified policy iteration algorithm with the sUboptimality test of Hastings and Mello type and proves that it constructs a finite sequence of policies whose last eleme:nt is either a unique optimal policy or an €-optimal policy. Moreover, a new notion of equivalent decision processes is introduced, and many iterative methods for solving a system of linear equations such as the J acobi method, simultaneous overrelaxation method, Gauss-Seidel method, successive overrelaxation method, stationary Richardson's method and so on are shown to convert the original semi-Markov decision process to equivalent decision processes. Various transformed algorithms are derived from the modified policy iteration algorithm with the sUboptimality test applied to those equivalent decision processes. Numerical comparisons are made for Howard's automobile replacement problem. They show that the modified policy iteration algorithm with the suboptimality test is much more efficient than van Nunen's algorithm and is superior to the policy iteration method, linear programming and some transformed algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Semi-Markov decision problems and performance sensitivity analysis

Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop PA theory for semi-Markov processes (S...

متن کامل

Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach

This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time Markov decision processes (MDPs). The reduction is based on the equivalence of strategies that change actions between jumps and the randomized stra...

متن کامل

Risk-Sensitive Markov Control Processes

We introduce a unified framework to incorporate risk in Markov decision processes (MDPs), via prospect maps, which generalize the idea of coherent/convex risk measures in mathematical finance. Most of the existing risk-sensitive approaches in various literature concerning with decision-making problems are contained in the framework as special instances. Within the framework, we solve the optima...

متن کامل

Model-Building Adaptive Critics for Semi-Markov Control

Adaptive (or actor) critics are a class of reinforcement learning algorithms. Generally, in adaptive critics, one starts with randomized policies and gradually updates the probability of selecting actions until a deterministic policy is obtained. Classically, these algorithms have been studied for Markov decision processes under model-free updates. Algorithms that build the model are often more...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009